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1 INTRODUCTION 



ABSTRACT 

We implement an Independent Component Analysis (ICA) algorithm to separate sig- 
nals of different origin in sky maps at several frequencies. Due to its self-organizing 
capability, it works without prior assumptions either on the frequency dependence 
or on the angular power spectrum of the various signals; rather, it learns directly 
from the input data how to identify the statistically independent components, on the 
assumption that all but, at most, one of them have non-Gaussian distributions. 

We have applied the ICA algorithm to simulated patches of the sky at the four 
frequencies (30, 44, 70 and 100 GHz) of the Low Frequency Instrument (LFI) of ESA's 
Planck satellite. Simulations include the Cosmic Microwave Background (CMB), the 
synchrotron and thermal dust emissions and extragalactic radio sources. The effects 
of detectors angular response functions and of instrumental noise have been ignored in 
this first exploratory study. The ICA algorithm reconstructs the spatial distribution of 
each component with rms errors of about 1% for the CMB and of about 10% for the, 
much weaker, Galactic components. Radio sources are almost completely recovered 
down to a flux limit corresponding to ~ 0.7<jcmb, where ucmb is the rms level of 
CMB fluctuations. The signal recovered has equal quality on all scales larger then the 
pixel size. In addition, we show that for the strongest components (CMB and radio 
sources) the frequency scaling is recevered with percent precision. Thus, algorithms 
of the type presented here appear to be very promising tools for component separa- 
tion. On the other hand, we have been dealing here with an highly idealized situation. 
Work to include instrumental noise, the effect of different resolving powers at differ- 
ent frequencies and a more complete and realistic characterization of astrophysical 
foregrounds is in progress. 



Maps produced by large area surveys aimed at imaging pri- 
mordial fluctuations of the Cosmic Microwave Background 
(CMB) contain a linear mixture of signals by several as- 
trophysical and cosmological sources (Galactic synchrotron, 

frac-frco nnrl rbiat «miagir,ng br.tb frnm i-nmparf qnrl rlif- 



A great deal of work has been carried out in recent 
years in this area (see de Oliveira-Costa & Tegmark 1999, 
and references therein; Tegmark et al. 2000). The problem of 
map denoising has b een tackled with t he wavelets analysis 
on the whole spher e (Tenorio et al. 1999) and on sky patches 
( [ganz et al. 1999b ). Algorithms to single out the CMB and 



the various foregrounds h ave been developed ( Bouchet et al 



fuse soilrces, extragalactic sources, Sunvaev-Zeldovich effect 
in clusters of galaxies or by inhomogeneous re-ionization, in 
addition to primary and secondary CMB anisotropies) con- 
volved with the spatial and spectral responses of the antenna 
and of the detectors. In order to exploit the unique cosmo- 
logical information encoded in the CMB anisotropy patterns 
as well as the extremely interesting astrophysical informa- 
tion carried by the foregound signals, we need to accurately 
separate the different components. 



199E ; Hobson et al. 1995 ; ) . In these works, Wiener filtering 



(WF) and the maximum entropy method (MEM) have been 
applied to simulated data from the Planck satellite, taking 
into account the expected performances of the instruments. 
Assuming a perfect knowledge of the frequency dependence 
of all the components, as well as priors for the statistical 
properties of their spatial pattern, these algorithms are able 
to recover the the strongest components, at the best Planck 
resolution. 
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We adopt a rather different approach, considering de- 
noising and deconvolution of the signals on one side and 
component separation on the other as separate steps in the 
data analysis process, and focus here on the latter step only, 
presenting a 'blind separation' method, based on 'Indepen- 
dent Component Analysis' (ICA) techniques. The method 
does not require any a priori assumption on spectral prop- 
erties and on the spatial distribution of the various compo- 
nents, but only that they are statistically independent and 
all but at most one have a non-Gaussian distribution. It is 
important to note that this is in fact the physical system 
we have to deal with: surely all the foregrounds are non- 
Gaussian, while the CMB is expected to be a nearly Gaus- 
sian fluctuation field for most of the candidate theories of 
the early universe. 

The paper is organized as follows. In Section 2 we in- 
troduce the relevant formalism and briefly review methods 
applied in previous works. In Section 3 we outline the ICA 
algorithm in a rather general framework, since it may be 
useful for a variety of astrophysical applications. In Section 
4 we describe our simulated maps. In Section 5 we give some 
details on our analysis and present the results. In Section 6 
we draw our conclusions and list some future developments. 



2 FORMALISM AND PREVIOUS 
APPROACHES 

We assume that the frequency spectrum of radiation compo- 
nents (referred to as sources) is independent of the position 
in the sky. Since we deal here with relatively small patches of 
the sky, we adopt Cartesian coordinates, (£, rj). The function 
describing the i-th source then writes 



Siti'V,") = Si(t,V) - Fit") 



i = 1, 



(1) 



where N is the number of independent sources and Ti(y) is 
the emission spectrum. 

The signal received from the point (£, rj) in the sky is 



N 

E 



(2) 



Suppose that the instrument has M channels, with spectral 
response functions tj{y), j = 1, . . . M centered at different 
frequencies, and that the beam patterns are independent 
of frequency within each passband. Let beam patterns be 
described by the space-invariant PSF's hj(£,r]), so that the 
maps are produced by a linear convolutional mechanism. 
(Note that this is an additional simplifying assumption since 
in real experiments a position dependent defocussing related 
to the chosen scanning strategy may occur.) Then, the map 
yielded by f h channel is: 



hj(£-x,ri- y)tj{y)- 



Sj(x, y)Ti(v)dxdydv + (£, rj) = 

i = l 

j = l,...,M, (3) 



where: 



«j (& V) = ^ «ji • s * ( x h V) i j = h---,M, (4) 

an = I Fi(v)t;j{v)dv , j = 1,...,M; i = 1,...,N , (5) 



* denotes linear convolution and (£,??) represents the in- 
strumental noise. Eq. (m can also be written in matrix form: 



x({,r?) = AB(£,rj) 



(6) 



where the entries of the MxN matrix A are given by Eq. (g) . 

The unknowns of our problem are the N functions 
Si(£,r]), and the data set is made of the M maps Xj(£, rj) 
in Eq. (^). Besides the measured data, we also know the in- 
strument beam-patterns hj(£,rf), and, more or less approx- 
imately (depending on the specific source), the coefficients 
a ]t in Eqj|j). 

Eq. (H) can be easily rewritten in the Fourier space: 



(7) 



where the capital letters denote the Fourier transforms of 
the corresponding lowercase functions, and 

Rji(ijJ(,LJri) = "Hj^jLJriJaji , (8) 

Tij being the Fourier transform of the beam profile hj. 
Eq. (Q) can thus be rewritten in matrix form: 

X = RS + £ . (9) 

The above equation must be satisfied by each Fourier mode 
(u)£,iJ v ), independently. The aim is to recover the true sig- 
nals Sj(wj,6i; r) ) constituting the column vector S. If the ma- 
trix A in Eq. ^ is known exactly then, in the absence of 
noise, the problem reduces to a linear inversion of Eq. Q 
for each Fourier mode. 

In practice, however, Tij vanishes for some Fourier 
mode. For these modes the entire j-th row of the matrix 
R also vanishes, and R may become a non-full-rank matrix. 
An inversion based on statistical approaches built on a priori 
knowledge is thus needed. 

In the following two subsections we briefly describe two 
such approaches, and in the third one we briefly recall a 
technique so far mostly exploited for the denoising problem 
and for extraction of extragalactic sources. 



2.1 The maximum entropy approach 

The Maximum Entropy Method (MEM) for the reconstruc- 
tion of images is based on a Bayesian approach to the prob- 
lem (Skirling 1988, 1989; Gull 1988). Let X be a vector of M 
observations whose probability distribution P(X|S) depends 
on the values of quantities S = Si, Sn- 

Let us P(S) be the prior probability distribution of S, 
telling us what is known about S without knowledge of the 
data. Given the data X, Bayes' theorem states that the con- 
ditional distribution of S (the posterior distribution of S) is 
given by the product of the likelihood of the data, P(X|S), 
with the prior: 



P(SIX) 



P(X|S)P(S) 



(10) 
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where z is a normalization constant. 

An estimator S of the true signal vector can be con- 
structed by maximizing the posterior probability P(SJX) oc 
P(X|S)P(S). However, while the likelihood in Eq. @ is 
easily determined once the noise and signal covariance ma- 
trices are known, the appropriate choice of the prior distri- 
bution for the model considered is a major problem in the 
Bayesian approach: since Bayes' theorem is simply a rule 
for manipulating probabilities, it cannot by itself help us to 
assign them in the first place, so one has to look elsewhere. 
The MEM is a consistent variational method for the assign- 
ment of probabilities under certain types of constraints that 
must refer to the probability distribution directly. 

The Maximum Entropy principle states that if one has 
some information / on which the probability distribution is 
based, one can assign a probability distribution to a proposi- 
tion i such that P(i\I) contains only the information I that 
one actually possesses. This assignment is done by maximiz- 
ing the Entropy 



H = -Y,P(i\I)logP(i\I) 



(11) 



It can be seen that when nothing is known except that 
the probability distribution should be normalized, the Max- 
imum Entropy principle yields the uniform prior. In our case 
the proposition i represents S, and the information I is the 
assumption of signal statistical independence. The standard 
application of the method considered strictly positive signals 
(Skirling 1988, 1989; Gull 1988); the extension to the case of 
CMB temperature fluctuations, which can be both positive 
and negative, was worked out by Hobson et al. (1998). 

The construction of the entropic prior requires, in gen- 
eral, the knowledge of the frequency dependence of the com- 
ponents to be recovered as well as of the signal covariance 
matrix C(k) =< S(k)S^(k) >, with the average taken on 
all the possible realizations. 



2.2 The multifrequency Wiener filtering 

If a Gaussian prior is adopted, the Bayesian appro ach gives 
the multifre quency Wiener filtering (WF) solution ( Bouchet 
ct al. 1999| ). In in this case too an estimator of the signal 



vector is obtained by maximizing the posterior probability 
in Eq. (Jlo|) , given the signal covariance matrix C(k). 

The Gaussian prior probability distribution for the sig- 
nal has the form 



P(S) oc exp(-S t C _1 S) 



(12) 



The estimator S is linearly related to the data vector X 
through the Wiener matrix W = (CT 1 + R^N^R) -1 , 
where R corresponds to the matrix in ([]) and N(k) =< 
e(k)e t (k) > is the noise covariance matrix: 



WX 



(13) 



The W matrix has the role of a linear filter; again, its con- 
struction requires an a priori knowledge of the spectral be- 
havior of the signals. 

This method is endangered by the clear non- 
Gaussianity of foregrounds. 



2.3 Wavelet methods 

The development of wavelet techniques for signal processing 
has been very fast in the last ten years (see, e.g., Jawerth et 



al. 1994). The wavelet approach is conceptually very simple: 



whereas the Fourier transform is highly inefficient in deal- 
ing with the local behavior, the wavelet transform is able to 
introduce a good space-frequency localization, thus provid- 
ing information on the contributions coming from different 
positions and scales. 

In one dimension, we can define the analyzing wavelet 
as ty(x;R, b) = i? -1 ' 2 ?/;[(:r. — b)/R], dependent on two 
parameters, dilation (R) and translation (b); ip(x) is a 
one-dimensional function satisfying the following condi- 
tions: a) J_ ao dxiJj(x) — 0, b) J_ tx dxijj 2 (x) = 1 and c) 
dk \k\~ 1 tp 2 (k) < oo, where ip(k) is the Fourier trans- 
form of ip{x). The wavelet operates as a mathematical 
microscope of magnification R" 1 at the space point b. The 
wavelet coefficients associated to a one-dimensional function 
f(x) are: 



w(R,b) = / dxf(x)<P(x;R,b) 



(14) 



The computationally faster algorithms for the wavelet anal- 
ysis of 2-dimensi onal images a re those based on Multires- 
olution analysis (Mallat 1989 ) or on 2D wavelet analysis 
(Lemarie & Meyer 19861), using tensor products of one- 



dimensional wavelets. The discrete Multiresolution analysis 
entails the definition of a one-di mensional scaling fu nction <j>, 
normalized as dx <f>(x) = 1 ( Ogden et al. 1997 ) . Scaling 



functions act as low-pass filters whereas wavel et functions 



single out one scale. The 2D wavelet method ( 5anz et al 



1999b) is based on two scales, providing therefore more in- 



formation on different resolutions (defined by the product of 
the two scales) than the Multiresolution one. 

Recently, wavelet techniques have been introduced in the 
analysis of CMB data. Denoising of CMB maps has been 
performed on patches of the sky of 12°. 8 x 12°. 8 using ei- 
ther multiresolution techniques ( 5anz et al. 1999a| ) and 2D 
wavelets (fcanz et al. 1999bj| , as well as on the whole celestial 
sphere (Tenorio et al. 199£). As a first step, maps with the 
cosmological signal plus a Gaussian instrumental noise have 
been considered. 

Denoising of CMB maps has been carried out by using 
a signal- independent prescription, the SURE thresholding 
method (Donoho & Johnstone 1995). The results are model 
independent and only a good knowledge of the noise affect- 
ing the observed CMB maps is required, whereas nothing 
has to be assumed on the nature of the underlying field(s). 
Moreover, wavelet techniques are highly efficient in localiz- 
ing noise variations and features in the maps. 

The wavelet method is able to improve the signal-to- 
noise ratio by a factor of 3 to 5; correspondingly, the error 
on C/s derived from denoised maps is about 2 times lower 
than that obtained with the WF method. 

Wavelets were also successfully applied to the detec- 
tion of point sources in CMB maps in the pres ence of the 
cosmolo gical signal and of instrumental noise (Tenorio et 



al. 199S ); more recently, successfull results on source detec- 
tion have als o been obtained in presence of diffuse galactic 
foregrounds (Cayon et al. 2000). The results are compara- 
ble to those obtained with the filtering method presented 
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by ( fTegmark fc de Oliveira-Costa (1998) ) which, however, 
relies on the assumption that all the underlying fields are 
Gaussian. 



3 THE ICA APPROACH 

We present here a rather different approach, characterized 
by the capability of working 'blindly' i.e. without prior 
knowledge of spectral and spatial properties of the signals to 
be separated. The method is of interest for a broad variety 
of signal and image processing applications, i.e. whenever 
a number of source signals are detected by multiple trans- 
ducers, and the transmission channels for the sources are 
unknown, so that each transducer receives a mixture of the 
source signals with unknown scaling coefficients and channel 
distortion. 

In this exploratory study we confine ourselves to the 
case of simple linear combinations of unconvolved sourc e 
signals ( Amari fc Chichocki 1998| ; Bell fc Sejnowski 1995 ). 



The problem can be stated as follows: a set of N signals is 
input to an unknown frequency dependent multiple-input- 
multiple-output linear instantaneous system, whose M out- 
puts are our observed signals. We use the term instantaneous 
to denote a system whose output at a given point only de- 
pends on the input signals at the same point. Our objective 
is to find a stable reconstruction system to estimate the origi- 
nal input signals with no prior assumptions either on the sig- 
nal distributions or on their frequency scalings. The problem 
in its general form is normally unsolvable, and a "working 
hypothesis" must be made. The hypothesis we make is the 
mutual statistical independence of our source signals, what- 
ever their actual distributions are. Several solutions have 
been proposed for this problem, each based on more or less 
sound principles, not all of which are typical of classical sig- 
nal processing. Indeed, information theory, neural networks, 
statistics and probability have played an important part in 
the development of these techniques. 

We do not consider here specific instrumental features 
like beam convolution and noise contamination, leaving the 
specialization of the ICA method to specific experiments to 
future work; this allows us to highlight the capabilities of this 
approach, able to work in conditions where other algorithms 
would not be viable. Therefore, we adopt Eq. (^) as our data 
model, just dropping the tilde accent on vector x. Also, the 
instrumental noise term in Eq. (Q) will be neglected. 

It can be proved that, to solve the problem de scribed 
above, the following hypotheses should be verified (Amari 
fc Chichocki 1998|; |Comon 199^): 



• All the source signals are statistically independent; 

• At most one of them has a Gaussian distribution; 

• M > N; 

• Low noise. 

The last two assumptions can be somewhat relaxed by 
choosing suitable separation strategies. As far as indepen- 
dence is concerned, roughly speaking, we may say that the 
search for an ICA model from non-ICA data (i.e. data not 
coming from independent sources) should give the most 'in- 
terest ing' (namely, the most s tructured) proje ctions of the 
data dHyvarinen fc Oja 1999| ; |Friedman 19871 ). Tnis is not 
equivalent to say that separation is achieved; however, we 



have seen from our experiments that a good separation can 
be obtained even for sources that are not totally indepen- 
dent. The second assumption above tells us that Gaussian 
sources cannot be separated. More specifically, they can only 
be separated up to an orthogonal transformation. In fact, 
it can be shown that the joint probability of a mixture of 
Gaussian signals is invariant to orthogonal transformations. 
This means that if independent components are found from 
Gaussian mixtures, then any orthogonal transformation of 
them gives mutually independent components. 

Many strategies have been adopted to solve the separa- 
tion problem on the basis of the above hypotheses, all based 
on looking for a set of independent signals, which can be 
shown to be the original sources. A formal criterion to test 
independence, from which all the separating strategies can 
be derived, is described later in this section. 

In order to recover the original source signals from the 
observed mixtures, we use a separating scheme in the form 
of a feed- forward neural network. The observed signals are 
input to an N x M matrix W, referred to as the the synaptic 



weight matrix, whose adjustable entries, Wi 



l,...N,j 



1, . . . M, are updated for every sample of the input vector 
x(£, n) (at step r) following a suitable learning algorithm. 
The output of matrix W at step r will be: 



u(&»7,t) = W(t)x(£,»,) 



(15) 



W(t) is expected to converge to a true separating matrix, 
that is, a matrix whose output is a copy of the inputs, for 
every point (£,??)• Ideally, this final matrix W should be 
such that W A = 7, where / is the N x N identity. As an 
example, if M = N, we should have W = A~ . There are, 
however, two basic indeterminacies in our problem: ordering 
and scaling. Even if we are able to extract N independent 
sources from M linear mixtures, we cannot know a priori the 
order in which they will be arranged, since this corresponds 
to unobservable permutations of the columns of matrix A. 
Moreover, the scales of the extracted signals are unknown, 
because when a signal is multiplied by some scalar constant, 
the effect is the same as of multiplying by the same constant 
the corresponding column of the mixing matrix. This means 
that W(r) will converge, at best, to a matrix W such that: 

WA = PD , (16) 

where P is any N x N permutation matrix, and D is a 
nonsingular diagonal scaling matrix. From Eqs. ([]), (|l^) and 
( p^| ) we thus have: 

u = Wx = WAs = PDs . (17) 

That is, as anticipated, each component of u is a scaled 
version of a component of s, not necessarily in the same or- 
der. This is not a serious inconvenience in our application, 
since we should be able to recover the proper scales for the 
separated sources from other pieces of information, for ex- 
ample matching with independent lower resolution observa- 
tions like those of COBE on the case of MAP and Planck. 
If A was known, the performance of the separation algo- 
rithm could be evaluated by means of the matrix W A. If 
the separation is perfect, this matrix has only one nonzero 
element for each row and each column. In any non-ideal sit- 
uation each row and column of W A should contain only one 
dominant element. 

In all the cases treated here we assume M > N, but we 
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consider the case where N, although smaller than M, is not 
known. 

The mutual statistical independence of the source sig- 
nals can be expressed in terms of a separable joint probabil- 
ity density function q(s): 



(18) 



3=1 



where qj{sj) is the marginal probability density of the j th 
source. 

Various algorithms can be used to learn the matrix W . 
All these algorithms can be derived from a unified principle 
based on the Kullback-Leibler (KL) divergence between the 
joint probability density of the output vector u, pu(u), and 
a function q(u), which should be suitably chosen among the 
ones of the type of Eq. (|l^) . The KL divergence between the 
two functions mentioned above may be written as a function 
of the matrix W , and can be considered as a cost function 
in the sense of Bayesian statistics: 



R(W) = 



Pi/(u)log du 
1 " 



(19) 



It can be proved that, under mild conditions on q(u), R(W) 
has a global minimum where W is such that W A = PD. The 
different possible choices for q(s) lead to the differe nt par- 
ticular learning strategies proposed in the literature (|Amari 



Chiciocki 1998; Yang & Amari 1997; Bell & Sejnowski 



1995) 



The uniform gradient search method, which is a 
gradient-type algorithm, takes into account the Riemannian 
metric structure of our objective parameter space, which is 



the set of all nonsingular matrices W (Amari & Chichocki 
199 8|). j n a general case, where the number N of sources is 
only known to be smaller than the number of observations, 
the following formula is derived: 

W(t + 1) = W(t) + 

+ a(r) • [A - u(r)u T (r) - f (u(r))u T (r)] W(r) , (20) 

where A is a M x M diagonal matrix: 

A = diag[(«i + /i(ui))iu] . . . [(% + /m(«m))iim] ■ (21) 

Pixel by pixel, the M x M matrix W is multiplied by the 
M- vector x, and gives vector u as its output. This output 
is transformed through the nonlinear vector function f(u), 
and the result is combined with u itself to build the up- 
date to matrix W, through Eq. (poj). The process has to 
be iterated by reading the data maps several times. If TV is 
strictly smaller than M, then M — N outputs can be shown 
to rapidly converge to zero, or to pure noise functions. 

The convergence properties of this iterative formula are 
shown to be independent of the particular matrix A, so 
that, even a strongly ill-conditioned system does not affect 
the convergence of the learning algorithm. In other words, 
even when the contributions from some components are very 
small, there is no problem to recover them. This property is 
called the equivariant property since the asymptotic proper- 
ties of the algorithm are independent of the mixing matrix. 
The r-dependent parameter a is the learning rate; its value 
is normally decreased during the iteration. As far as the 
choice of a(r) is concerned, a strategy to learn it and its 



annealing scheme is given in Amari et al. (1998); we have 
chosen oiir) decreasing from 10 -3 to 10 -4 linearly with the 
number of iterations. 

The final problem is how to choose the function f(u). 
If we know the true source distributions qj(uj), the best 
choice is to make fj{Uj) = Qj{uj), since this gives the maxi- 
mum likelihood estimator. However, the point is that when 
qj(uj) are specified incorrectly, the algorithm gives the cor- 
rect answer under certain conditions. In any case, the choice 
of f (u) should be made to ensure the existence of an equi- 
librium point for the cost function and the stability of the 
optimization algorithm. These requirements can be satisfied 
even though the nonlinearities chosen are not optimal. A 
suboptimal choice for sub-Gaussian source signals (negative 
kurtosis), is: 



fi(ui) = fiui + m\ui 



(22) 



and, for super-Gaussian source signals (positive kurtosis): 



fi(ui) — /3ui + tanh(7iti) 



(23) 



where j3 > and 7 > 2; if one source is Gaussian, the above 
choices remain viable as well. In our case, we verified that 
all the source functions except CMB are super-Gaussian, 
and thus we implemented the learning algorithm following 
Eq.(|2o|), with the nonlinearities in Eq. (p3|), and (3 = 0, 
7 — 2. As already stated, the mean of the input signal at 
each freque ncy is subtracted. In previous works (Yang & 



Amari 1997) the initial matrix was chosen as W <x I; in that 



analysis, the image data consisted of a set of components 
with nearly the same amplitude. The initial guess for W 
affects the computation time, as well as the scaling of the 
reconstructed signals and their order. Interestingly, we found 
that adjusting the diagonal elements so that they roughly 
reflect the different weights of the components in the mixture 
can speed-up the convergence. For the problem at hand, 
the results shown in § |B| have been obtained starting from 
W =diag[l,3,30,10], for the case of a 4 x 4 VU-matrix, and 
using only 20 learning steps: the time needed was about 1 
minute on a UltraSparc machine, equipped with an 300 MHz 
UltraSparc processor, 256 MBytes RAM, running down SUN 
Solaris 7 Operating System, compiling the FORTRAN 90 code 
using SUN Fortran Workshop 5.0 



4 SIMULATED MAPS 

We produced simulated maps of the antenna temperature 
distribution with 3'. 5 pixel size of a 15° x 15° region cen- 
tered at I = 90° , b = 45° , a t the four central frequ encies of 
the Planck/LFI channels ( Mandolesi et al. 1998 ), namely 
30, 44, 70 a nd 100 GHz (Fig. |l|) The HEALPix pixeliza- 
tion scheme (see Gorski et al. 199£) was adopted. The maps 
include CMB anisotropies, Galactic synchrotron and dust 
emissions, and extragalactic radio sources. 

CMB fluctuations correspond to a flat Cold Dark Mat- 
ter (CDM) model (Qcdm = -95, Qb = -05, three massless 



neutrino species), no rmalized to the COBE data (^ee Seljak 



k, Zaldarriaga 1996 ) . As it is well known, the CMB spec- 
trum, in terms of antenna temperature, writes: 



antenna t i- \ thermod. / j- \ 

SCMB (C.^fJ = S CMB {t,V! 



(e* - l) 2 ' 



(24) 
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v/56.8 and v is the f requency in GHz 



thermod. ^ ^ j g frequency independent (|Fixsen et al. 1996|) 



where v 

therrr 
S CMB 

As for Galactic synchrotron emission, we have extrap- 
olated the 408 MHz map with about 1 degree resolution 
(Haslam et al. 1982), assuming a power law spectrum, in 
terms of antenna temperature: 



~F syn OC ^ 



(25) 



with spectral index n s = 2.9. 

The dust emission maps with about 6' resolution con- 
structed by Schlegel et al. (1998) combining IRAS and 
DIRBE data have been used as templates for Galactic dust 
emission. The extrapolation to Planck/LFI frequencies was 
done assuming a grey-body spectrum: 



-m+l 



Tdust oc 



(26) 



with m = 2, v — hv/kTdust, Tdust being the dust temper- 
ature. Although, in general, Tdust varies across the sky, it 
turns out to be approximately constant at about 18 K in 
the region considered here; we have therefore adopted this 
value in the above equation. 

Because of the lack of a suitable template, we have ig- 
nored here free-free emission, which may be important par- 
ticularly at 70 and 100 GHz. This component needs to be 
included in future work. 

The model by Toffolatti et al. (1998) was adopted for 



extraga |lactic radio sources, assumed to have a Poisson dis- 

-±9- 



tributid n. An antenna tomporauro opoctral index Hts 

was adopted (T^ oc u~ nis ). 



5 BLIND ANALYSIS AND RESULTS 

As it is well known, the strongest signals at the Planck/LFI 
frequencies come from the CMB and from radio sources (al- 
though the latter show up essentially as a few high peaks), 
whereas synchrotron emission and thermal dust are roughly 
1 or 2 orders of magnitude lower, depending on frequency. 
Thus we are testing the performances of the ICA algorithm 
with four signals exhibiting very different spatial patterns, 
frequency dependences and amplitudes. 

Since we are interested in the fluctuation pattern, the 
mean of the total signal (sum of the four components) is 
set to zero at each frequency. We adopt a "blind" approach: 
no information on either the spatial distribution or the fre- 
quency dependence of the signals is provided to the algo- 
rithm. 

The reconstructed maps of the the four components are 
shown in Fig. ^| Several interesting features may be noticed. 
The order of the plotted maps is permuted with respect to 
the input maps in Fig. ^, reflecting the order of the ICA 
outputs: the first output is synchrotron, the second repre- 
sents radio sources, the third is CMB and the fourth is dust. 
All the output maps look very similar to the true ones; even 
synchrotron lower resolution pixels have been reproduced. 
In Figs. and [] we analyze the goodness of the separa- 

tion by comparing power spectra and showing scatter plots 
between the inputs and the outputs. 



5.1 Signal reconstruction 

For each map, we have computed the angular power spec- 
trum, defined by the expansion coefficients Ce of the two 
point correlation function in Legendre polynomials. As is 
well known, it can conveniently be expressed in terms of the 
coefficients of the expansion of the signal S into spherical 
harmonics, S(9,</>) = ^ lm at m Y lm {9,(j)): 

1 .o. 



C e = 



21 + 1 



(27) 



Such coefficients are useful because from elementary prop- 
erties of the Legendre polynomials it can be seen that the 
coefficient Ce quantifies the amount of perturbation on the 
angular scale given by 9 ~ 180/£ degrees. 

The panels on the top of Figs. ^, || ^| show the power 
spectra of the input (left) and output (right) signals. The 
CMB exhibits the characteristic peaks on sub-degree angu- 
lar scales due to acoustic oscillations of the photon-baryon 
fluid at decoupling; the dashed line represents the theoret- 
ical model from which the map was generated, while the 
solid line is the power spectrum of our simulated patch: the 
difference between the two curves is due to the sample vari- 
ance corresponding to the CMB Gaussian statistics. Radio 
sources are completely different, having all the power on 
small scales with the typical shot noise spatial pattern; dust 
and synchrotron emissions have power dec reasing on small 



scale s roughly as a powe r law, as expected ( Mandolesi et al 



1995 ; Puget et al. 1995 ) . The left-hand side panels on the 



bottom show the quality factor, defined as the ratio between 
true and reconstructed power spectrum coefficients, for each 
multipole I. Due to the limited size of the analyzed region, 
the power spectrum can be defined on scales below roughly 
2°. The bottom right-hand side panels are scatter plots of 
the ICA results: for each pixel of the maps, we plotted the 
value of the reconstructed image vs. the corresponding input 
value. 

The reconstructed signals have zero mean and are in 
unit of the constant d multiplying each output map, pro- 
duced during the separation phase, as mentioned in § 3: the 
scale of each signal is unreproducible for a blind separa- 
tion algorithm like ICA. Nevertheless, a lot of information 
is encoded into the spatial pattern of each signals, and ulti- 
mately its overall normalization could be recovered exploit- 
ing data from other experiments. Therefore, the relation be- 
tween each true signal and its reconstruction is 



= d-S° U +b , i = 1, Npixe 



(28) 



where b represents merely the mean of the input signal, that 
is zero for the CMB and some positive value for the fore- 
grounds. 

To quantify the quality of the reconstruction, we have 
recovered d and 6 by performing a linear fit of output to 
input maps (s ln ,s° ut ) for each signal: 



d- 



d-s" 



(29) 



where the sums run over all the pixels, and the bar indicates 
the average value over the patch; the values of d and b, as 
well as the linear fits (dashed lines), are indicated for all the 
signals in the scatter plot panels. Also, in the same panels 
we show the standard deviation of the fit, that is 
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Table 1. Input and output frequency scalings of the various components. 



Frequency Radio sources CMB synchrotron dust 



(GHz) 


input 


output 


input 


output 


input 


output 


input 


output 


100 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


70 


1.97 


1.95 


1.14 


1.14 


2.81 


1.36 


0.68 


0.93 


11 


4.76 


4.70 


1.22 


1.23 


10.8 


1.72 


0.35 


1.93 


30 


9.86 


9.70 


1.26 


1.26 


32.8 


-12. 


0.19 


3.77 



N, 



pixels 



1/2 



(30) 



A comparison of such quantity with the input signals (bot- 
tom right-hand side panels) gives an estimate of the good- 
ness of the reconstruction. CMB and radio sources are re- 
covered with percent and 0.1% precision, respectively, while 
the accuracy drops roughly to 10% for the (much weaker) 
Galactic components, synchrotron and dust. Also, the lat- 
ter appear to be slightly mixed; this is likely due to the fact 
that they are somewhat correlated so that the hypothesis of 
statistical independence is not properly satisfied. 

We have also tested to what extent the counts of radio 
sources are recovered. This was done in terms of the relative 
flux 



As = s/s n 



(31) 



Smax being the flux of the brightest source. 

In Fig. [?] we show the cumulative number of input 
(dashed) and output (solid line) pixels exceeding a given 
value of As. The algorithm correctly recovers essentially all 
sources with As > 2 x 10~ 2 , corresponding to a signal of 
T s ~ 50 fiK, or to a flux density S = (2k B T a /\ 2 ) Afi ~ 
15 mjy, where ks the Boltzmann constant, A the wave- 
length and AQ the solid angle covered by a pixel, that is 
3.5' x 3.5' ~ 10~ 6 sr. At fainter fluxes the counts are overes- 
timated; this is probably due to the contamination from the 
other signals. In any case, the flux limit for source detection 
is surprisingly low, even lower of the rms CMB fluctuations 
{ocmb — 70/iK at the resolution limit of our maps), sub- 
stantially lower or at least comparable to that achieve d with 
other methods which require st ronger assumptions (dayon 



et al. 2 )0C 



Hobson et al. 1998). This high efficiency in de- 



tecting point sources illustrates the ability of the method 
in taking the maximum advantage of the differences in fre- 
quency and spatial properties of the various components. 

On the other hand, we stress that our approach is ide- 
alized in a number of aspects: beam convolution and instru- 
mental noise have not been taken into account, and the same 
frequency scaling has been assumed for all radio sources. 
Therefore more detailed investigations are needed to esti- 
mate a realistic source detection limit. 

Finally note that the quality of the separation is similar 
on all scales, as shown by the bottom left-hand side panels 
of Figs. |^, |f| ^. The exception are radio sources, whose 
true power spectrum goes to zero at low ^'s more rapidly 
than the reconstructed one. 



5.2 Reconstruction of the frequency dependence 

Another asset of this technique is the possibility of recover- 
ing the frequency dependence of individual components. The 
outputs can be written as u = W~x., where x = As. As previ- 
ously mentioned, in the ideal case W A would be a diagonal 
matrix containing the constants d for all the signals, multi- 
plied by a permutation matrix. It can be easily seen that, 
if this is true, the frequency scalings of all the components 
can be obtained by inverting the matrix W and perform- 
ing the ratio, column by column, of each element with the 
one corresponding to the row corresponding to a given fre- 
quency. However, as pointed out in § 3, if some signals are 
much smaller than others the above reasoning is only ap- 
proximately valid. This is precisely what is happening in our 
case: we are able to accurately recover the frequency scaling 
of the strongest signals, CMB and radio sources, while the 
others are lost (see Table hi). 



6 CONCLUDING REMARKS AND FUTURE 
DEVELOPMENTS 

We have developed a neural network suitable to implement 
the Independent Component Analysis technique for sepa- 
rating different emission components in maps of the sky at 
microwave wavelengths. The algorithm was applied to sim- 
ulated maps of a 15° x 15° region of sky at 30, 44, 70, 100 
GHz, corresponding to the frequency channels of Planck's 
Low Frequency Instrument (LFI). 

Simulations include the Cosmic Microwave Background, 
extragalactic radio sources and Galactic synchrotron and 
thermal dust emission. The various components have 
markedly different angular patterns, frequency dependences 
and amplitudes. 

The technique exploits the statistical independence of 
the different signals to recover each individual component 
with no prior assumption either on their spatial pattern or 
on their frequency dependence. The great virtue of this ap- 
proach is the capability of the algorithm to learn how to re- 
cover the independent components in the input maps. The 
price of the lack of a priori information is that each signal 
can be recovered multiplied by an unknown constant pro- 
duced during the learning process itself. However this is not 
a substantial limitation, since a lot of physics is encoded in 
the spatial patterns of the signals, and ultimately the right 
normalization of each component can be obtained by resort- 
ing to independent observations. 

The results are very promising. The CMB map is re- 
covered with an accuracy at the 1% level. The algorithm 
is remarkably efficient also in the detection of extragalactic 
radio sources: almost all sources brighter then 15 mjy at 
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100 GHz (corresponding to ~ O.Ioqmb, ctcmb being the 
rms level of CMB fluctuations on the pixel scale) are re- 
covered; on the other hand, it must be stressed that is not 
directly indicative of what can be achieved in the analysis of 
Planck/LFI data because the adopted resolution (3'. 5 x 3'. 5) 
is much better than that of the real experiment, instrumen- 
tal noise has been neglected and the same spectral slope was 
assumed for all sources. 

Also the frequency dependences of the strongest com- 
ponents are correctly recovered (error on the spectral index 
of 1% for the CMB and extragalactic sources). 

Maps of subdominant signals (Galactic synchrotron and 
dust emissions) are recovered with rms errors of about 10%; 
their spectral properties cannot be retrieved by our tech- 
nique. 

The reconstruction has equal quality on all the scales 
of the input maps, down to the pixel size. 

All this indicates that this technique is suitable for a 
variety of astrophysical applications, i.e. whenever we want 
to separate independent signals from different astrophysical 
processes occurring along the line of sight. 

Of course, much work has to be done to better explore 
the potential of the ICA technique. It has to be tested under 
more realistic assumptions, taking into account instrumental 
noise and the effect of angular response functions as well as 
including a more complete and accurate characterization of 
foregrounds. 

In particular, the assumption that the spectral proper- 
ties of each foreground component is independent of position 
will have to be relaxed to allow for spectral variations across 
the sky. Also, it will be necessary to deal with the fact that 
Galactic emissions are correlated. 

The technique is flexible enough to offer good prospects 
in this respect. In the learning stage, the ICA algorithm 
makes use of non-linear functions that, case by case, are 
chosen to minimize the mutual information between the out- 
puts; improvements could be obtained by specializing the 
ICA inner non-linearities to our specific needs. Also, it is 
possible to take properly into account our prior knowledge 
on some of the signals to recover, still taking advantage as 
far as possible of the ability of this neural network approach 
to carry out a "blind" separation. Work in this direction is 
in progress. 
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We also thank Krzysztof M. Gorski and all the people who 
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and MURST. LT acknowledges financial support from the 
Spanish DGES, projects ESP98-1545-E and PB98-0531- 
C02-01. 
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Figure 1. Inputs maps used in the ICA separation algorithm: from top left, in a clockwise sense, simulations of CMB, synchrotron, 
radio sources and dust emission are shown. Radio sources and dust grey scale are is non-linear to better show the signal features. 



Figure 2. Reconstructed maps produced by the ICA method; the initial ordering has not been conserved in the outputs. From top left, 
in a clockwise sense, we can recognize synchrotron, radio sources, dust and CMB. Radio sources and dust grey scale is non-linear as in 
Fig.l. 



Figure 3. Top left: input angular power spectra, simulated (solid line) and theoretical (dashed line, see text). Top right: the angular 
power spectrum of the reconstructed CMB patch. Bottom left: quality factor relative to the input/output angular spectra. Bottom right: 
scatter plot and linear fit (dashed line) for the CMB input/output maps. 



Figure 4. Top panels: angular power spectra for the simulated input (left) and reconstructed (right) synchrotron map. Bottom left: 
quality factor relative to the input/output angular spectra. Bottom right: scatter plot and linear fit (dashed line) for the synchrotron 
input/output maps. 



Figure 5. Top panels: angular power spectra for the simulated input (left) and reconstructed (right) dust emission map. Bottom left: 
quality factor relative to the input /output angular spectra. Bottom right: scatter plot and linear fit (dashed line) for the dust input /output 
maps. 
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Figure 6. Top panels: angular power spectra for the simulated (left) and reconstructed (right) radio source map. Bottom left: quality 
factor relative to the input/output angular spectra. Bottom right: scatter plot and linear fit (dashed line) for the radio source emission 
input/output maps. 

Figure 7. Cumulative number of pixels as a function of the threshold As (see text for more details): input (dashed line) versus output 
(solid line). 
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